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Type I Error Rate and Power of Rank Transform 
ANOVA When Populations are Non-Normal and 
Have Equal Variance 



ABSTRACT. The rank transformation approach to 
analysis of variance as a solu«:ion to the 
Behrens-Fisher problem is examined. Using 
simulation methodology four parameters were 
manipulated for the two group design: (1) 
ratio of population variances; (2) distri- 
bution form; (3) sample size and (4) popula- 
tion mean difference. The results indicated 
that while the rank transform approach was 
less sensitive to variance inequality than the 
parametric ANOVA F-ratio, unacceptably high 
Type I error rates were obtained when cell 
frequencies and group variances were inversely 
related. With equal cell frequencies and/or 
when cell frequencies were directly related to 
group variances, appropriate Type I error 
rates were obtained. Under these conditions 
however, the Brown-Forsythe procedure for 
comparing, group means provided greater pov7er 
except when the sampled distribution was 
leptokurtic. 



Both empirical and analytic studies have repeatedly 
shown that paramet/ic analysis procedures for 
comparing group means are extremely sensitive to 
population variance inequality when sample sizes are 
^ markedly unequal. When sample size and group variance 
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are positively correlated, the nominal significance 
level is underestimated while with a negative 
relationship between sampJe size and group variance, 
the nominal significance level is overestimated 
(Glass, Peckham and Sanders, 1972). Even with equal 
sample sizes, Ramsey (1980) has shown that the actual 
probability of a Type I error for the t-test may 
either over or underestimate the nominal significance 
level. Developing alternative data analysis 
strategies when population variances differ, also 
known as the Behrens-Fisher problem, has therefore 
been an area of considerable interest, and several 
solutions to the problem have been suggested. The 
procedure offered by Welch (1947, 1951), in 
particular, has gained considerable attention. 
Welch's solution modifies the ANOVA F-ratio by 
weighting the sample means by the ratio of the group 
frequency to the group variance. In addition the 
degrees of freedom error are adjusted so that the 
computed statistic approximates the F distribution. 
Wang (1971) has shown that this approximation is 
satisfactory for most situations. James (1951) 
suggested a similar weighting procedure but used the 
chi-square distribution as the reference distribution. 

Recently Brown and Forsythe (1974a) have suggested a 
slightly different approach to the Behrens-Fisher 
problem. Their statistic takes the ratio of the sums 
of squares between groups to a weighted sum of group 
variances. The test statistic has an approximated F 
distribution. For the two group case, the Welch and 
the Brown-Forsythe procedures are identical 
(Brown-Forsythe, 1974a), but differ when multiple 
groups are compared. Both procedures have been 
generalized for factorial designs (Brown and Forsythe, 
1974b; Johansen, 1980; Algina and Olejnik. • 1984). 

A number of investigations have studied both of 
these strategies and compared them with respect to 
their Type I error rates and statistical pover. The 
results of these studies have shown that both 
approaches are insensitive to variance inequality when 
the sampled distributions were normal (Kohr and Games, 
1974; Brown and Forsythe, 1974a; Levy, 1978; Dijkstra- 
and Werter, 1981; Lee and Fung, 1983). Under 
non-normal parent distributions the Brown-Forsythe 
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statistic was shown to provide appropriate Type I 
error rates (Clinch and Keselman, 1982; Lee and Fung, 
1983) • The results with Welch's approach however have 
been mixed and inconsistent. There is some evidence 
to indicate that the approach is liberal for skewed 
distributions when the number of levels of the 
grouping factor is four or more (Clinch and Keselman, 
1982; Levy, 1978). Levy on the other hand found 
appropriate Type I error rates when there were three 
levels of the independent variable and the sampled 
population had a chi-square distribution. When data 
were sampled from heavy-tailed distributions, some 
results have indicated that Welch's procedure provides 
a conservative test of group means (Yuen, 1974; Lee 
and Fung, 1983). Other evidence however indicates 
appropriate Type I error rates (Clinch and Keselman, 
1982). Differences in these conclusions may be a 
function of the degree to which the sampled 
populations departed from normality. Finally for 
light-tailed distributions, the Welch procedure has 
been shown to provide a liberal test of means (Levy, 
1978), but other results indicate that appropriate 
Type I error rates are possible (Yuen, 1974). 

When the procedures were compared in terms of tneir 
statistical power, the results have been mixed but 
generally consistent. Both procedures provide 
comparable power when the population distributions are 
normal and the variances are equal. Under this 
condition both procedures are only slightly less 
powerful than the ANOVA F-zratio (Brown and Forsythe, 
1974a; Dijkstra and Werter, 1981; Clinch and Keselman, 
1982; Lee and Fung, 1983). With unequal variances and 
a normal parent distribution, the Brown-Forsythe 
approach provided greater power when the extreme mean 
had lower variance, while the Welch procedure was more 
sensitive if the extreme mean had the large variance 
(Brown and Forsythe, 1974a; Dijkstra and Werter, 1981; 
Lee and Fung, 1983). Clinch and Keselman however 
found very little difference in statistical power 
between the procedures for this condition. The 
statistical power for all of the procedures studied 
however was relatively low, and that may explain the 
inconsistency in the results reported by Clinch and 
;ieselman. Finally for heavy-tailed distributions the 
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Welch procedure provided a slight power advantage 
- (Clinch and Keselman, 1982; Lee and Fung, 1983). 

Recently, Dauphin (1983) considered a different 
approach to the Behrens-Fisher problem. She suggested 
transforming the original data by using rarks before 
group means are compared. After ranking the data from 
highest to lowest across all comparison groups, the 
parametric analysis of variance F-ratio is computed. 
This strategy of transforming data using ranks before 
computing parametric analysers has been suggested by 
Conover and Iman (1981) as a linking procedure between 
parametric and nonparametric analysis strategies. 
They have suggested that the ranking approach can be 
used in a- variety of research contexts and 
considerable research has been conducted using this 
approach generally with positive results. Nath and 
Duran (1981a, 1981b) studied the procedure when two 

f^°o?\ ^° compared, Conover and Iman 

(1982) applied the approach to analysis of covariance 
Iman and Conover (1979) used the rank transformation 
in a regression problem, and Iman (1974) studied the 
approach for factorial designs when an interaction was 
present. Although the theoretical rationale for the 
procedure is not fully developed, progress in that 

/^oof\^°" reported by Iman, Hora and Conover 

( 1984) . 

The use of the rank transformation has been 
motivated primarily as an alternative analysis 
strategy to parametric statistics when sampled 
distributions were non-ncrmal. In this context the 
rank transformation has often provided a more 
sensitive test of the location parameter than the 
parametric alternative. The rationale of applying the 
rank transformation as a solution to the 
Behrens-Fisher problem was based on previous findings 
that nonparametric strategies, while affected by 
variance inequality, are less sensitive than the 
?tQA^^*^'^^'^ alternatives (Wetherill, I960). Glazer 
(1963), for example, empirically demonstrated 
Wetherill 's asymptotic results showing that the 
Wilcoxon-Mann-Whitney probability of a Type I error 
was less affected than Student's t-test for independ- 
ent sample means when the population variances 
differed. Since the rank transform is monotonically 
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related to the Wilcoxon test, Dauphin expected similar 
conclusions. Her results confirmed her expectation 
showing that the actual Type I error rate for the rank 
transform did not deviate greatly from the nominal 
significance level when the sampled population was 
normal. 

Since the rank transformation has gained 
considerable interest and Dauphin's results indicate 
that the approach may have some merit in some 
situations, It was decided to examine this proposed 
solution to the Behrens-Fisher problem a little 
closer. Specifically the purpose of the study was to 
analyze the empirical Type I error rates of the rank 
transform ANOVA with parametric analysis of variance 
and Brown-Forsythe's procedure when population 
variances differed, and the distributions were normal 
or non-normal. In addition, for those situations 
where appropriate Type I error rates were observed, 
the statistical power estimates for small, medium and 
large differences in group means were compared. 

Computer Simulation 

In order to calculate empirical Type I error rates 
and statistical power estimates for each of the 
competing analysis strategies under a variety cf 
conditions, four factors were manipulated: *1) sample 
size^ 2) distribution form; 3) population mean 
difference and A) population variance inequality. 
Although all three of the procedures can be used for 
comparing group means in multiple group designs, 
including factorial designs, the present investigation 
was limited to comparisons between two groups. 

Sample Size . Samples of (10,15), (15,10), (20,20), 
(17,23), and (23,17) were included in the 
investigation. The sample sizes considere<^ here were 
thought to be moderate and representative of those 
often found in research studies in the social 
sciences. Small departures from equal n's were chosen 
to represent common attrition rates in social 
research. 

Distribution Form . A normal and four non-normal 
parent distributions were considered. The non-normal 
distributions included a light-tailed, platykurttc 
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distribution, a symmetric, leptokurtic (heavy-tailed) 
distribution, a moderately skewed distribution, and a 
distribution which was both skewed and leptokurtic. 
The population cnaracteristics of these distributions 
are discussed in the data generation section. 

Population Mean Difference . To study the Type I 
error rates of the three procedures, data were 
generated from populations which had a common mean. 
Power estimates were obtained by comparing the 
proportion of hypotheses rejected when data were 
sampled from populations which differed by .2, .5, or 
.8 pooled standard deviation units. These differences 
in the location parameters have been suggested by 
Cohen (1977) as representing small, medium, and large 
effects respectively. 

Populatio n Variances > The present study considered 
populations which had common variances as well as six 
levels of variance inequality. Specifically data were 
generated from populations with the following variance 
pairs: (1,1), (1,1.5), (1.2.0), (1,2.5), (1,3.0), 
U,3.5), and (1,4.0). The choice of these variance 
differences was based on two considerations. First, it 
was believed thaf the conditions considered reflected 
common situations encountered by applied researchers. 
Second, it was believed that with the unequal 
sample size combinations studied, the variance 
differences would affect the Type I error rate of the 
parametric ANOVA F-ratib. 

I)ata Generation . Data for the study were generated" 
using the SAS computing package. Scores on the 
dependent measure were created based on the linear 
modal function Yi j » y . . +a . . +a . e „here Y.. is 
the i^^ observation in the j^h-^grou^. ^The gre.nd iean 
y ..was set equal to 10. The effect size parameter 
for the j"^" group, a . . , was varied from 0, .2, .5, or 
.8 pooled standard deviation units to study the effect 
population mean difference. In all cases the shift 
parameter was added to the second group so that 
Hi <V2' The random error component c ^. was 
generated using the SAS NORMAL function to simulate 
scores, j , from a standard normal distribution. For 
a normally distributed error component, z ^. „as set 
equal to j . For a non-normally distributed 
component, Xij was transformed using a power function 
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developed by Fleishman (1978): ^ij « 

[ (dXi j j +b ] Xi j +a • The constants a, b, c and d 
were chosen to transform the standard normal variable 
to a variable with known skewness and kurtosis and 
null mean and unit variance* Four non-normal 
distributions we^re considered in the study. 
Descriptive statistics and frequency distributions at 
half standard deviation intervals are included in 
Table 1. Values reported in the table are based on 
20,000 random variables generated for each 
distribution. The variance of the observations in 
group one was kept constant at 1 for all conditions 
studied while the variance of the second group was 
increased from i to 4 in increments of .5 units by 
multiplying the random error component by the desired 
standard deviation. 

Computed Test Statistics . In each sample generated, 
the group means were compared using the parametric 
analysis of variance F-ratio, the Brown-Forsythe 
(1974a) test statistic, and the rank transform 
analysis of variance F-ratio. 

The parametric analysis of variance F-ratio is 
computed as the ratio of the mean square between group 
means to the pooled within group variance: 

E,n,(Y. -Y..)2/(J-1) 



I (n -1)S^/(N-J) 
j ^ ^ 

where nj is the number of observations in the j^^ 
group; N is tht total number of observations JLn the 
samplf; J is the number of groups in_the study; Y.j is 
the sample mean for the j^^ group; Y. . is the grand 
mean; is the variance of the j^^ group. The 

critical test statistic is obtained from the F 
distribution with J-i and N-J degrees of freedom. 

The rank transform ANOVA F-ratio is computed using 
the same j.ormula as parametric ANOVA with the 
dependent variable obtained by replacing the original 
observations with the rank of the observation* The 
observations are ranked by assigning a 1 to the lowest 
observation and N to the highest observation in the 
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Frequency Distributions and Descriptive Statistics 



Distributions 



Interval Noraal Platykurtic Skewed Leptokurcic 



Skewed/ 
Leptokurtic 



• » ,-3.0 


17 




-3.0,-2.5 


85 




-2.5,-2.0 


232 




-2.0,-i.5 


889 


1552 


-1.5,-1.0 


1885 


2297 


-1.0,-0.5 


2^70 


2917 


-0.5, 0.0 


3826 


3235 


0.0, 0.5 


3817 


3177 


0.5, 1.0 


3038 


2805 


1.0, 1.5 


18^9 


2411 


1.5, 2.0 


855 


1606 


2.0, 2.5 


332 




2.5, 3.0 


86 




3.0, « 


19 




Mean 


-.0015 


.0049 


Variance 


.9836 


1.0109 


Skevness 


.000^ 


- .0005 


Kurt OS is 


-.0938 


-1.0131 



3605 
3976 
3591 
2053 
2345 
1552 
1039 
520 
230 
89 



151 
119 
301 
601 
1257 
2816 
4745 
4753 
2748 
1343 
586 
.263 
178 
139 



.0009 
1.0631 
.7266 
- .0846 



.0004 
1.0292 
- .1297 
3.5547 



8555 
4219 
2577 
1777 
1142 
671 
440 
268 
351 



- .0063 
.9774 
1.6820 
3.1517 
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total sample across all groups • If ties are present 
the average rank is assigned to all tied observations. 
The F-ratio is computed as: 

Sn,(R-,X,)^/J-l 



E(n -1)S| /N-J 



where R,j is rhe mean rank of group j; R,. is the 
grand mean rank N+l/2 ; S^rj is the variance on the 
ranks for the j^^ group* The critical test statistic 
for the rank transform F-ratio is the same as that for 
the parametric ANOVA, 

The Brown-Forsythe statistic is obtained as the 
ratio of the sum of squares between groups and a 
weighted sum of within group variance: 



2n CY.j^Y..)2 
j ^ 

where the terms are defined as stated above. The 
Brown-Forsythe F statistic has an F distribution wifh 
degrees of freedom J-l and f where 1/f is equal to 

CcJ/Cn^-l): and c^-Cl- ^)s2/ [zd^ ^)S^^ 1 



For each condition, 1000 replications of the three 
statistics were computed, and the frequency at which 
each procedure rejected the null hypothesis of equal 
population means at the ,05 level was recorded. In 
evaluating the robustness of each procedure, it was 
decided that observed proportions of Type I errors two 
standard errors above or below the nominal 
significance level would be judged as unacceptable. 
Based on 1000 replications, observed Type I error 
rates outside the interval (,036, ,064) were 
considered nonrobust. 
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Resuxts 

The results of the study are reported in two 
sections. In the first section the empirical Type I 
!S! Brown-Forsythe, the parametric 

ANOVA, and the rank tran-form ANOVA are presented for 
increasing variance inequality with equal and unequal 
sample size combinations. The second section presents 
the proportions of hypotheses rejected when population 
differed by .2, .5, or .8 pooled standard deviation 
units representing small, medium, and large effect 
sizes. The power results are reported only for those 
conditions where appropriate Type I error rates were 
obtained. 

Type I Error Rates 

As a preliminary test of the computer program and 
the data generation procedure, data for three sample 
size combinations were generated from populations 
identical in their form, scale, and location. The 
sample means for these samples were compared using the 
three analysis procedures under consideration, and the 
proportion of hypotheses rejected at a nominal 
significance level of .05 were recorded. Table 2 
reports the results of these analyses for the five 
distribution forms studied. None of the observed 
proportions exceeded two standard errors above o- 
below the expected five percent level. These results 
therefore support the adequacy of the data generation 
and analysis procedures used in the study. 

The observed Type I error rates, as the difference 
in gvoup variances increased, are reported in Table 3 
for the five distribution forms with equal and unequal 
sample size combinations. For the normal distribution 
the results reported here are consistent with those 
presented by Dauphin (1983). The Type I error rate 
for the rank transform ANovA was affected to a lesser 
degree than the parametric ANOVA F-ratio. However for 
situations where the smaller samples had greater 
variance, the proportion of Type I errors were more 
than two standard errors above the nominal 
significance level and therefore judged as being 
unacceptably high. when large samples had greater 
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Table 2 

Type I Error Rates for the Brown-Forsythe (BF), 
Parametric ANOVA (F), and the Rank Transform (RF) 
ANOVA 



Sample Size 



15/10 20/20 23/17 

Distribution BF F RF BF F RF BF F PF 





.044 


.046 


.050 


..044 


.044 


.043 


.060 


.064 


.059 


Platykurtic 


.057 


.058 


.058 


.056 


.056 


.052 


.054 


.051 


.OAS 


Skevtd 


.050 


.043 


.050 


.052 


.052 


.055 


.055 


.057 


.054 


Ltptokurcic 


.048 


.051 


.056 


.041 


.042 


.046 


.053 


.056 


.061 


Sktvtd and 


.045 


.045 


.053 


.046 


.048 


,052 


.044 


.040 


.045 



L«ptokurtic 



Moca: Kooinala* .05. 
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Table 3 o 

Type 1 Error Rptes for the Brew For sy the- (BF) , Parametric ANOVA (F), and the ^ 
Rank Transform ^ iOVA (RF) 2. 



S.iwpJe SJze (nj/n2) 



'''RaUo" ^ iZZi2 IlZli ^ 

Oistrlbution o^io^ BF F RF BK F RF 8F F RF BF F RF BF F RF ^ 



Normal 



Datykiirtlc 



Skewed 



ERIC 



1:1.5 


.059 


.049 


.052 


.050 


.069 


.066 


.044 


.045 


.042 


.047 


.038 


.OSO 


.046 


.051 


.052 


1:2.0 


.05> 


.041 


.055 


.055 


.069 


.066 


.054 


.057 


.066 


.057 


.044 


.042 


.037 


.045 


.046 


1:2.3 


.056 


.040 


.043 


.048 


.067 


.068 


.049 


.049 


.056 


.046 


.036 


.047 


.055 


.068 


.069 


1:3.0 


.055 


.014 


.045 


.053 


.089 


.079 


.050 


.052 


.061 


.044 


.037 


.045 


.04^ 


.060 


.05f 


1:3.5 


.049 


.028 


.037 


.055 


.085 


.075 


.046 


.046 


.049 


.051 


.034 


.040 


.063 


.078 


.074 


1:4.0 


.044 


.010 


.041 


.043 


.084 


.081 


.067 • 


.071 


.080 


.047 


.034 


.047 


.043 


.068 


.r67 



1 
1 


:l 


.5 


.043 


.040 


.047 


.060 


.063 


.068 


.041 


.043 


.043 


.044 


.041 


.048 


.048 


.056 


.055 


:2 


.0 


.040 


.028 


.014 


.050 


.061 


.066 


.048 


.048 


.054 


.045 


.040 


.044 


.048 


.056 


.060 


I 


:2 


.5 


.039 


.024 


.014 


.060 


.079 


.078 


.053 


.055 


.058 


.053 


.039 


.047 


.058 


.071 


.067 


I 


:3 


.0 


.058 


.040 


.059 


.052 


.076 


.075 


.059 


.059 


.062 


.055 


.018 


.046 


.045 


.066 


.072 


1 


:3 


.5 


.045 


.027 


.040 


.052 


.077 


.078 


.049 


.051 


.061 


.056 


.041 


.058 


.049 


.081 


.074 


1 


:4 


.0 


.055 


.035 


.045 


.059 


.087 


.094 


.052 


.053 


.063 


.050 


.033 


.051 


.C51 


.078 


.077 



1:1.5 


.038 


.039 


.052 


.060 


.059 


.069 


.057 


.057 


.057 


.048 


.045 


.055 


.051 


.060 


.071 


1:2.0 


.057 


.045 


.054 


.054 


.078 


.090 


.041 


.041 


.056 


.055 


.041 


.060 


.051 


.057 


.075 


i:2.5 


.054 


.049 


.071 


.065 


.079 


.089 


.046 


.047 


.071 


.057 


.047 


.072 


.051 


.072 


.091 


1:3.0 


.051 


.015 


.052 


.057 


.089 


.088 


.048 


.049 


.080 


.047 


.016 


.065 


.051 


.070 


.081 


1:3.5 


.049 


.015 


.064 


.048 


.075 


.098 


.049 


.050 


.079 


.061 


.047 


.086 


.041 


.068 


.097 


1:4.0 


.045 


.027 


.051 


.054 


.088 


.ao9 


.051 


.056 


.079 


.057 


.040 


.081 


,059 


077 


.110 



14 



l4.*ptokitrt !c 


1:1.3 


.034 


.027 




.049 


.06& 


.061 




1:2.0 


.033 


.042 


.030 


.OJb 


.033 


.043 




i-.2.3 


.046 


.028 


.041 


.019 


.070 


.063 




1:3.0 


.046 


.023 


.038 


.041 


.073 


.032 




1:3.3 


.036 


.023 


.037 


.031 


.088 


.077 




1:4.0 


.040 


.022 


.048 


.041 


.080 


.073 


Skewvd and 


1:1.3 


.039 


.040 


.067 


.068 


.070 


.103 


I.eptoKurCic 


1:2.0 


.046 


.048 


.093 


.036 


.062 


.120 




1:2.3 


.030 


.049 


.M6 


.067 


.077 


.147 




1:3.0 


.043 


.043 


.118 


.071 


.033 


.130 




1:3.3 


.043 


.039 


.no 


.077 


.006 


.170 




1:4.0 


.036 


.049 


.149 


.076 


)00 


.173 



.032 


.032 


.049 


.033 


.047 


.034 


.043 


.037 


.049 


.043 


.043 


.053 


.047 


.033 


.030 


.033 


.067 


.062 


.051 


.031 


.048 


.064 


.047 


.033 


.049 


.063 


.033 


.030 


.051 


.037 


.049 


.040 


.034 


.030 


.072 


.069 


.04S 


.047 


.048 


.044 


.029 


.031 


.061 


.086 


.071 


.037 


.039 


.042 


.064 


.039 


.035 


.031 


.082 


.079 


.038 


.039 


.108 


.033 


.047 


.089 


.046 


.033 


.097 


.043 


.047 


.119 


.036 


.032 


.145 


.053 


.061 


.149 


.068 


.069 


.160 


.0J9 


.036 


.137 


.068 


.073 


.182 


.060 


.060 


.190 


.042 


.033 


.133 


.071 


.079 


.188 


.038 


.061 


.192 


.043 


.038 


.203 


.077 


.093 


.209 


.039 


.062 


.229 


.066 


.047 


.192 


.068 


.089 


.2?1 
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variance the rank transform had acceptable Type I 
error rates while the ANOVA F-ratio underestimated the 
nominal significance level. With equal sample sizes, 
both the ANOVA F and the rsnk transform were not 
seriously affected by variance inequality. The 
Brown--Forsythe procedure provided appropriate Type I 
error rates for all degrees of variance inequality and 
sample size combinations. 

With symmetric, non-normal distributions the 
observed Type I error rates were similar to those 
obtained under the normal populations. The rank 
transform ANOVA had Type I error rates which were 
affected to a lesser degree than the parametric ANOVA 
F-ratio. Error rates within the acceptable range were 
obtained for the rank transform approach when sample 
sizes were equal and when the larger sample size had 
greater variance. When the sample with fewer 
observations had greater variance, the observed Type I 
error rate exceeded the nominal significance level by 
more than two standard errors. When samples were 
selected from skewed i^opulations, the rank transform 
approach had observed Type I error rates 
overestimating the nominal sigmficance level for all 
sample size combinations except when samples of 
(10,15) were selected. Under the latter condition, 
appropriate Type I error rates were obtained. The 
Type I error rates for the ANOVA F-ratio were as 
expected and were similar to those obtained under 
normal distributions. With the skewed and leptokurtic 
distribution, the rank transform became quite liberal, 
even for the condition with sample sizes of (10,15). 
Again Type I error rates for the parametric F were 
similar to those obtained with the normal 
distribution. The Brown-Forsythe procedure 
overestimated the nominal significance level when the 
samples distribution was bonh skewed and leptokurtic 
and the larger variance was matched with the samples 
having fewer observations. When larger samples were 
matched with lower variance, appropriate Type I error 
rates were obtained. These results were consistent 
with those reported by Clinch and Keselman (1982) in 
their analysis of Welch's procedure. 

In summarizing these results, the observed Type I 
error rates for the rank transform was affected to a 



74 



ERLC 



16 



Rank Transform ANOVA 



lesser degree than the paramecric ANOVA F-ratio when 
the samples distributions were symmetric. Thia 
conclusion is consistent with that predicted by 
Wecherill (1960) and previously demonstrated for the 
•normal distribution by Dauphin (1983)* With skewed 
distributions however, the observed Type I error rate 
overestimated the nominal significance level. The 
effect of variance inequality on the statistical power 
of the rank transform ANOVA when the sam^^led 
distribution was S3anmetric is presented in the next 
section* 

Statistical Power 

The proportion of hypotheses rejected when the 
populations differed by a small, medium, or large 
effect size (.2, .5, or. 8 pooled standard deviation 
units respectively) are reported in Tables 4 and 5 for 
th^ symmetric distributions studied when samples weie 
(20,20) and (17,23) respectively. With equal sample 
sizes the Brown-Forsythe and parametric ANOVA provided 
comparable power estimates for all three symmetric 
distributions. When sample sizes were unequal the 
Brown-Forsythe procedure provided a more sensitive 
test for the difference in population means for all 
three distributions. These results were expected 
since 'under the conditions studied with unequal sample 
sizes, the F-ratio leads to a conservative test. 

Differences between power estimates for the rank 
transform ANOVA and those provided by the 
Brown-Forsythe and the parametric ANOVA procedures 
were similar when sample sizes were equal or unequal • 
For the normal and platykurtic distributions, the rank 
transform ANOVA provided power estimates slightly 
lower than those of the other two procedures. When 
the sampled distribution was leptokurtic, however, the 
rank transform procedure provided a more sensitive 
test for the difference in population means than 
either the Brown-Forsythe or the parametric ANOVA. 

Conclusions 

The results of the study indicate that the rank 
transformation approach to analysis of variance can 
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Table 4 

Proportion of Hypotheses Rejected for the 
Brown-Forsythe (BF), Parametric ANOVA (F) and the Rank 
Transform ANOVA (RF) 



Distribution 



Vjriance 
lUcio 



Korsal 



Placykurtic 



Lepcokurcic 



::ffect 
Size 





BF 


F 


RF 


3? 


F 


RF 


Bf 


F 


RF 


1:1.0 


.098 


.098 


.09o 


.100 


.101 


.089 


-100 


.100 


.115 


1:1.5 


.096 


.099 


.103 


..095 


..096 


.089 


.065 


.067 


.091 


1:2.0 


.095 


.099 


.096 


.084 


.086 


.078 


.099 


.100 


.106 


1:2.5 


.087 


.092 


.089 


.109 


.110 


.097 


.094 


.097 


.096 


1:3.0 


.101 


.102 


.109 


.096 


.039 


.088 


.107 


.110 


.131 


1:3.5 


.091 


.092 


.099 


.102 


.104 


.095 


.095 


.097 


.US 


1:4.0 


.084 


.084 


.090 


.097 


.101 


.084 


.110 


.11.^ 


.109 


1:1.0 


.377 


.378 


.342 


.304 


.304 


.265 


.376 


.378 


.417 


1:1.5 


.341 


.342 


.319 


.328 


.329 


.287 


.348 


.351 


.388 


1:2.0 


.335 


.344 


.324 


.324 


325 


.275 


-.364 


.364 


.410 


l':2.5 


.235 


.338 


.330 


.326 


.330 


.269 


.336 


.339 


.408 


1:3.0 


.357 


.363 


.351 


.345 


.348 


.281 


.334 


.344 


.405 


1:3.5 


.318 


.326 


.312 


.298 


.302 


.236 


.370 


.373 


.443 


1:4.0 


.364 


.369 


.349 


.341 


.348 


.273 


.343 


.352 


.441 


1:1.0 


.718 


.718 


.678 


.699 


.700 


.640 


.697 


.701 


.766 


1:1.5 


.696 


.697 


.662 


.705 


.707 


.623 


.725 


.728 


.806 


1:2.0 


.678 


.682 


.644 


.689 


.694 


.611 


.719 


.724 


.798 


1:2.5 


.696 


.702 


.656 


.668 


.676 


.554 


.596 


.703 


.776 


1:3.0 


.682 


.690 


.637 


.671 


.678 


.567 


.697 


.703 


.786 


1:3.5 


.683 


.689 


.661 


.655 


.663 


.537 


.706 


.721 


.800 


1:4.0 


.666 


.673 


.641 


.689 


.694 


.553 


.689 


.697 


.779 
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Table 5 

Proportion of Hypotheses Rejected for the 
Brovm-Forsythe (BF), Parametric ANOVA F, and the Rank 
xransform ANOVA (RF) 



Discribuclon 



Variance 
Ratio 



Koraal 



Placvkurtic 



Lepcokurcic 



Effect 
Size 





BF 


F 


RF 


BF 


F 


RF 


BF 


F 


RF 


I 


1.0 


.097 


.095 


.090 


.082 


.082 


.082 


.078 


.078 


.39i 


X 


1.5 


.115 


.108 


.110 


.087 


.078 


.077 


.080 


.072 


.097 


1 


2.0 


.104 


.089 


.063 


.093 


.076 


.086 


.098 


.082 


.120 


1 


2.5 


.095 


.070 


.086 


.080 


.064 


.on 


.100 


.077 


.117 


1: 


3.0 


.039 


.067 


.083 


.102 


.084 


.091 


.125 


.099 


.143 


1 


3.5 


.102 


.074 


.088 


.104 


.079 


.089 


.119 


.091 


.112 


1 


4.0 


.101 


.085 


.091 


.101 


.073 


.075 


.112 


.068 


.113 


1:1.0 


.350 


.342 


.327 


.304 


.310 


.281 


.369 


.372 


.420 


1:1.5 


.331 


.314 


.304 


.356 


.329 


.296 


.382 


.367 


.429 


1:2.0 


.384 


.355 


.339 


.358 


.314 


.278 


.365 


.247 


.417 


1:2.5 


.365 


.321 


.326 


.365 


.5J7 


.234 


.389 


.344 


.437 


1 


3.0 


.377 


•.324 


.348 


.350 


.299 


.252 


.389 


.337 


.448 


1: 


3.5 


.383 


.324 


.349 


.356 


.301 


.270 


.390 


.345 


.448 


1 


4.0 


.366 


.295 


.309 • 


.371 


.294 


.260 


.411 


.346 


.443 


1 


1.0 


.666 


.672 


.625 


.693 


.698 


.637 


.671 


.674 


.762 


1 


1.5 


.707 


.680 


.662 


.691 


.669 


.617 


.731 


.718 


.792 


1 


2.0 


.742 


.700 


.699 


.711 


.683 


.620 


.753 


.712 


.805 


I 


2.5 


.738 


.638 


.667 


.742 


.698 


.610 


.753 


.715 


.823 


1 


3.0 


.754 


.679 


.689 


.737 


.685 


.610 


.772 


.714 


.819 


1 


3.5 


.743 


.688 


.675 


.736 


.679 


.598 


.740 


.700 


.790 


1 


4.0 


.772 


.708 


.712 


.741 


.638 


.592 


.761 


.709 


.834 
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provide a solution to the Behrens-Fisher problem, but 
this solution is appropriate only for a limited set of 
conditions. In particular the rank transform ANOVA 
may be recommended when sample frequencies are 
positively related to group variances and the form of 
the population distribution is leptokurtic* Under 
that condition the actual Type I error rate does not 
overestimate the nominal significance level and the 
rank transform provides a slight power advantage over 
the Brown-Forsythe solution. This result was 
interesting in that the power advantage for the rank 
transform procedure was obtained even though the 
actual Type I error rate underestimated slightlv the 
nominal significance level. These results indicate 
that Type I error rate alone should not be used to 
evaluate or compare statistical analysis strategies. 
On the other hand, consideration of both statistical 
power and actual Type I error rates do provide minimum 
criteria in judging the usefulness of' analysis 
alternatives. 

For other symmetric distributions the rank transform 
procedure did not pro\ride any statistical advantage 
compared to the Brown-Forsythe procedure. With skewed 
population distributions, however, the rank transform 
approach overestimated the nominal significance level 
even when .the sample frequencies were equal. This 
finding may be viewed as an important limitation of 
the rank transform strategy. 

As a general solution to the group variance 
inequality problem, the results of this study do not 
provide sufficient evidence to recommend any single 
analysis approach. Before computing hypothesis tests, 
researchers should first obtain descriptive summary 
statistics to determine the sample distribution 
characteristics and to use this information to guide 
their choice of analysis procedures. For "^most 
situations where the population variances differ, the 
Brown-Forsythe procedure can be used to compare means. 
This procedure has been shown in the present study, as 
well as previous investigations, to be generally 
robust to variance inequality and to provide 
statistical power comparable to or greater than 
parametric analysis of variance. There is some 
evidence however which indicates that when the sampled 

78 

20 



Rank Transform ANOVA 



distributions are both skewed and leptokurtic, the 
Brown-Forsythe procedure can overestimate the nominal 
significance level if sample frequencies and group 
variances are negatively related. 
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